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ABSTRACT 


Given  a sample  set  of  independent  identically  distributed 

real -valued  random  variables,  each  with  the  unknown  probability 
density  function  f(-).  the  problem  considered  is  to  estimate  f from 
the  sample  set.  The  function  f is  assumed  to  be  in  12(3, b);  f is  not 
assumed  to  be  in  any  parametric  family.  This  paper  constructs  an 
adaptive  "two-pass"  solution  to  the  problem:  In  a pre-processing 
step  (the  first  pass),  a preliminary  rough  estimate  of  f is  obtained 
by  means  of  a standard  orthogonal-series  estimator.  In  the  second 
pass,  the  preliminary  estimate  is  used  to  transform  the  orthogonal 
series.  The  new,  transformed  orthogonal  series  is  then  used  to  obtain 
the  final  estimate.  The  paper  establishes  consistency  of  the 
estimator  and  derives  asymptotic  (large  sample  set)  estimates  of  the 
bias  and  variance.  It  is  shown  that  the  adaptive  estimator  offers 
reduced  bias  (better  resolution)  in  comparison  to  the  conventional 
orthogonal  series  estimator.  Computer  simulations  are  presented 
which  demonstrate  the  small  sample  set  behavior.  A case  study  of  a 
bimodal  density  confirms  the  theoretical  conclusions. 
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I.  A.  Introduction 

A real  random  variable  (r.v.)  X is  character'ized  by  the 
associated  cumulative  distribution  function  (c.d.f.) 

1)  F(x)  = PrfXi  xj  . 

If  the  measure  induced  onff  by  F is  absolutely  continuous  with 
respect  to  Lebesgue  measure,  then  we  may  define  the  probability 
density  function  (p.d.f. ) f (•)  as 

2)  f(x)  ^ F (x) 

the  Radon-Nykodym  derivative  of  F.  . 

In  many  statistical  situations,  the  p.d.f.  is  not  known  a priori, 
and  the  investigator  must  estimate  f from  a sample  set  {X-j , . . . 
where  each  is  independent  with  density  £(•)• 

In  many  cases,  mathematical  analysis  or  physical  theory  leads 
to  the  conclusion  that  f belongs  to  some  class  of  functions  which  are 
characterized  by  some  parameters  p^,...,p^.  Then  the  investigator 
must  only  determine  the  values  of  the  r parameters.  This  is  called 
"parametric  estimation."  An  example  is  the  frequently  occurring 
case  where  X is  a Gaussian  r.v.;  then  only  the  mean 

A = Erx3 

and  variance 

= E n-A]^ 

are  required  to  characteriza  X- 

However,  in  many  situations,  the  p.d.f.  f belongs  to  no  known 
parametric  class.  This  situation  may  arise  when  the  underlying 
physical  mechanism  generating  X is  unknov/n  or  extremely  complicated. 

In  this  case  the  investigator  must  estimate  the  entire  function  f(-) 


2. 

rather  than  a vector  of  parameters.  This  task  is  known  as  "non- 
parametric"  estimation. 

Several  techniques  of  non- parametric  estimation  have  been 
proposed  by  a number  of  researchers.  These  will  be  reviewed  below. 

The  current  work  concerns  a modification  to  one  of  these 
techniques,  namely  the  orthogonal -series  estimator.  We  propose  a 
prior  transformation  of  the  orthogonal  series  which  "tunes"  the 
series  to  the  given  sample  set.  The  effect  of  the  transformation  is 
to  reduce  the  bias  of  the  estimator  for  a sample  set  of  a given  size 
N.  The  transformation  is  obtained  from  a pre-processing  step  v/herein 
we  examine  the  sample  set  before  applying  the  estimator. 

I.B.  Summary  of  Previous  Approaches 

One  of  the  earliest  and  most  widely  studied  non- '.ara.notric 
density  function  estimator  was  introduced  by  M.  Rosen!  latt  [1]  in 
1955.  He  proposed  the  kernel -type  estimator 

1 ^ if-7- 

" hN 

where  K(‘)  is  a given  kernel  function  and  h = h(N)  is  a scaling  factor 
depending  on  the  sample  size  N.  The  estimator  was  furtner  studied 
by  E.  Parzen  [2J  in  1961. 

G.S.  Watson  and  M.R.  Leadbetter  C33  investigated  cptii.al  choices  for 
the  kernel  shape  K(-)-  A particular  kernel  shape  offering  attractive 
theoretical  and  practical  properties  was  obtained  by  d.C.  Bennett, 
R.J.P.  de  Figueiredo,  and  J.R.  Thompson  [4]  with  the  use  of  B-splines. 
K.B.  Oavis  [5]  studied  a kernel  which  is  not  L-j  and  demonstrated 
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superior  asymptotic  properties;  numerical  trials  with  small  sample 


sizes  show  poor  performance,  however  [63.  Convergence  conditions  for 
kernel  estimatohs  [73  and  related  nearest  neighbor  estimators  [8j 
were  studied  by  L.P.  Devroye  and  T.J.  Wagner. 

Another  type  of  estimator,  using  an  orthogonal  series  expansion, 
was  introduced  by  R.  Kronmal  and  M.  Tarter  [9],  Cencov  [10],  van 
Ryzin  [11],  and  Schwatz  [12J;  they  developed  error  estimates  and 
optimal  series  approximations.  The  optimal  results  require  knowledge 
of  the  unknown  density  f.  H.D.  Brunk  [13]  considered  ways  of  extract- 
ing the  needed  knowledge  from  the  sample  itself. 

A totally  different  approach  was  taken  by  G.F.  de  Montricher, 
R.A.  Tapia,  and  J.R.  Thompson  [14].  In  this  theoretical  paper,  the 
density  estimate  is  the  one  which  maximizes  a penalized  likelihood 
fLnction.  A descretized  numerical  implementation  by  D.  Scott  [21]. 
_ave  excellent  small-sample  perfcnnarice.  An  earlier  effort  along 
tinese  lines  is  that  of  I.J.  Good  and  R.A.  Gaskins  [15]. 

A.  Wragg  and  D.C.  Dowson  [163  >Jse  the  information-theoretic 
concept  of  entropy  to  fit  density  functions  to  a truncated  moment 
sequence.  Grace  Wahba  [173  and  P.  Whittle  [ 18]  employ  notions  from 
stochastic  processes  to  obtain  "optimally-smoothed"  density  estimates. 

I . C . Surimary  of  Results 

In  section  1 1 , we  take  a close  look  at  the  orthogonal  series- 
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type  estimator,  and  develop  asymptotic  error  analyst's  for  the  special 
case  of  the  Fourier  series  estimator.  In  section  III,  we  introduce 
a new  data-adapti ve  modification  of  the  Fourier  series  estimator. 
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The  series  is  modified  with  a transformation  derived  from  a pre- 
processing step.  The  modified  series  reduces  the  bias  of  the  estima- 
tor for  a sample  set  of  given  size  N.  We  develop  the  asymptotic 
error  analysis  of  the  estimator  and  produce  consistency  results. 
Finally,  in  section  IV  we  examine  some  computer  simulations  to  study 
the  behavior  of  the  estimator  on  small  sample  sets. 

I . D.  Notation  and  Conventions 

Throughout  this  paper  we  will  assume  the  following  notation 
and  conventions. 

1)  1 is  a real -valued  random  variable  with  probability  density 
function  (p.d.f.)  f(*)* 

2)  We  are  given  a sample  set  of  size  N 

where  each  is  an  independent  realization 

of  X. 

3)  The  expected  value  of  X is  denoted  by  E Cxl  and  the  square 
of  E fx]  by  (E  (^X]  )^.  The  notation  E [^X3  ^ is  the  same  as 
E CX^I 

4)  The  asterisk  z*  denotes  complex  conjugate. 

5)  The  symbol  □ denotes  the  end  of  a proof. 
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II. A.  Series- type  Estimators 

Consider  a (Lebesgue)  integrable  function  g defined  on  the 
interval  (a.b).  Let  g satisfy  g(x)>0  almost  everywhere  for 
X in  (a,b)  and  g(x)  dx  = 1. 

We  can  define  12(9).  the  class  of  square-integrable  functions 
weighted  by  g. 

1)  L2(g)  = Js:(a,b)-^fl?  I s(x)^g(x)dx  <00}  . 

Furthermore,  let  there  be  given  k=0.  a complete  orthonormal 

family  in  12(9). 

Suppose  that  f{*).  the  p.d.f.  of  the  random  variable  X,  is 
such  that  f/g  is  in  L2  (g).  Then  f may  be  expanded  as 

2)  f(x)  = g(x)  2 t>.u.(x). 

k=0 

By  orthogonality,  we  can  see 

r 

E [u  .(1)3  = 1 ij  .(x)f(x)dx 

J ' J 

Ki 

’a  ^-2 

= Uj(x)g(x)  bj.U|^(x)dx 

■-  "j- 

Now  an  estiinator  for  bj^  is 


3)  u(Z). 

''  ''  K J 

Thus  we  can  ccnstruct  an  estimate  of  f 


by 


4) 


For  some  n c K. 

It  is  easy  to  derive  error  expressions  for  (4)  in  terms  of  the 
coefficients  in  the  expansion  (2).  A convenient  error  measure  is 


In  (6)  the  first  term  is  the  variance  term  and  the  second  term  is  the 
bias  term. 

A desirable  property  of  any  estimator  is  asymptotic  consistency, 
v.hich,  loosely  speaking,  means  that  as  the  size  of  the  sample  set 
increases,  the  error  decreases.  To  sharpen  this  notion,  we  define 
several  types  of  asymptotic  consistency. 

7)  Definition 

A 

Let  fj^  be  an  estimator  for  f given  a sample  set  of  size  N. 
Let  be  in  (a,b). 

If  E [f^(Xo)  - f{Xo)]  ^ -^0^  then  f^is 

"asymptotically  consistent  in  the  mean  square  sense  at  xq." 

■'  a 

If  E ff.Xx)  - f(x)'|  ^ dx  — >0  then  f^is 


b 


asymptotically  consistent  in  the  integrated  mean  square  sense. 
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If  for  every  c>0  there  is  an  such  that  for  N>N  we  have 

c c 

*^r  ^ < c,  then  ^is  assymptotically  consistent 

in  probability  at  Xq. 

The  definition  of  the  estimator  (4)  is  not  complete,  since  we 
have  not  specified  the  choice  of  n.  Let  us  choose  n=n(N)  as  a func- 
tion of  N in  such  a way  that 

8.1)  n(N)-^co  as  N->oo, 

8.2)  and  n ( N ) -»  0 as 

If  we  assume  that  there  is  a uniform  bound  B such  that 
var  ['u|^(I)J  < B,  K=0,l,2,... 

then  a simple  argument  shows  that  with  choice  (8),  the  estimator  (1')is 

asymptotically  consi stent  in  the  integrated  mean  square  sense. 

The  precise  dependence  of  n(N)  is  here  left  deliberately 

vague.  Optimal  choices  ai'e  investigated  in  [9]  . 

An  often-studied  extension  of  (4)  is 

7)  f{x)  = g(x)  . , (h)  b.  u.  (x) 
k=0  ^ 


where  rw.( is  a sequence  of  weights  parameterized  by  a 
^ J k=0 

positive  parameter  h.  '.■le  ciioose  the  weights  so  that 


8.1 ) W|^{h)  0 as  k CO 

8.2)  W|^(h)— >1  ash->0. 


Optimal  choices  of  the  weight  sequence  ^ have  ! i^en 

studied  in[l3j  . Briefly,  the  optimal  functional  form  of 


depends  on  f,  and  the  choice h = h(fi)  depends  on  the  sample  set  sire. 


I 

1 
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II.  B.  Fourier  Series  Estimators 

The  Fourier  series  estimator,  a special  case  of  (II.  A.  4), 
has  been  studied  extensively  by  Kronmal  and  Tarter  [9].  They  were 
interested  primarily  in  integrated  mean  square  error  and  optimal 
truncation  point  n for  the  estimator.  We  shall  be  concerned  here  and 
later  with  the  pointwise  mean  square  error,  E [f(xQ)  - ^(xq)]  . The 
following  development  in  this  section  is  new,  although  it  follows 
somewhat  in  the  spirit  of  II}  and  [2]. 

From  now  on  we  will  assume  that  f takes  its  uspport  on  a finite 
interval  ta,b‘)  . The  error  introduced  by  this  assumption  is  small  in 
comparison  to  the  bias  and  variance  components  to  be  analyzed  later. 
Furthermore,  we  will  take  a = 0,  b = 1.  This  is  done  for  technical 
convenience,  since  a simple  linear  scaling  and  translation  will  return 
us  to  the  general  case  [a,b]  . 
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Let  be  a sequence  of  (complex)  functions  of 

a real  positive  variable  h.  Consider  the  estimator  given  by 

A 5°  A 

1.1)  f(x)  = 2Z  w.(h)b.  exp  (2nikx), 

k=  -oo*^ 

A * 1 N 

1.2)  b.  = (5  S exp  (-2nikX.). 

K IN  j^l  j 


We  are  interested  in  the  behavior  of  this  estimator  for  large  N. 
In  particular,  we  will  derive  asymptotic  estimates  of  var  [_f(xQ)] 
and  bias  If(xQ)]  for  x^e  [O,!]  . 

A 

It  is  clear  that  the  behavior  of  f depends  greatly  on  the  choice 
of  \W|^(-)|  1^-  _oo  ^ digression  to  study 

some  properties  of  |w|^(-)j  |^_  .^q  which  we  will  then  use  to  answer 

A 

questions  about  f. 

2)  Lemma 

. , 00 

Let  \W|^(')j  |^_  be  a weight  sequence. 

Suppose  for  each  h>0 

CX3  P 

r Iw.  (h)  I CO  and 
k=  -00 


for  each  k,  W|^(h)  = w_|^(h)*. 
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Then  the  kernel  defined  by 

00 

2.1)  Kjj(x)  = ®^P  (2^ikx) 

k=-oo 

is  a real  periodic  function  in  L2  f O.lJ  with  period  1. 
Moreover,  the  estimator  (1)  may  be  written  as 


A 1 N 

2.2)  f(x)  = i 5 K.  (x-T.) 


N ’'h'^'  -j- 


Proof 

Statement  (2.1)  is  immediate. 
For  (2.2),  notice 


00 


f(x)  = ^ w^(h)  b^  exp(277ikx) 

k=-oo 


" ? M ] ”k^^^  exp(27rikx) 

= TT  5 ^ w.  (h)  exp  (2zrikx  - 27rikXj 

j=l  k=-£»  J 


N 5i 

J ’ 


Expression  (2.2)  has  a form  similar  to  that  of  the  Parzen 
kernel  estimator  (seeCZ^  )•  However,  in  the  present  case  K^(*)  is 
a periodic  kernel  and  does  not  depend  on  h as  a simple  scale  factor. 
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The  dependence  on  h is  more  complicated,  and  this  dependence  must 
be  conditioned  for  the  estimator  to  behave  properly. 

Henceforth  we  will  assume  that  the  weight  sequence  satisfies  the 
f ol 1 owi ng : 


3)  Conditions 


3.1)  k=-oo  s^t^sfies  the  hypothesis  of  lemma  (2). 


tso 


Moreover,  K.  (x)  = ^ w.(h)  exp  (2^ikx) 

^ k=-(» 


satisfies 

3.2) 

3.3) 

3.4) 

3.5) 

3.6) 

3.7) 


J 

3.8) 


K^(x)>0 

Kh(-x)  = K^(x) 

Ph 

K^,(x)dx  =1 
-h 

(x)  is  pointwise  continuous  in  h>  0 and  x. 
2 

j K^(x)x^dx->0  as  h ->  0. 

Let  >0. 
then 

*2  2 

K.  (x)x^dx 

g ^ _-^0  as  h->0 

H 2 ^ 

K (x)x^dx 
0 " 

Let  H>£>0.  Then  there  exists  > 


0 such  that 


'H  2 
I K^(x)^dx< 
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as  h -►0. 

Under  assumptions  (3)  it  is  possible  to  establish  some  limits 
which  will  arise  shortly  in  the  asymptotic  error  analysis.  The  proof  is 
straightforward  though  lengthy  analysis  and  is  omitted,  (ihe  omitted  proofs 
may  be  found  in  L23'l.) 


4)  Lemma 

Under  the  assumptions  of  conditions  (3),  we  have 


4.1) 


4.2) 


K.{x)^dx— > CO  as  h-^0 


K,(x)  |x|^dx 


— y 0 as  h — >0 


% 


K,  (x)x^dx 


1, 

'"2 


'2 


4.3)  j ^ K|^(x)^x^dx 


0 as  h — y 0 


K,  (x)^dx 


iv.'o  of  the  quantities  are  iripnrf-nt  pnnunh  tn  merit,  specific 
notation  which  will  be  used  extensively. 
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c(h)  t 


5)  Definition 

For  a kernel  let 


ifo 


(x)x  dx 


v(h)  = 1 K^(x)^dx 

We  require  one  further  lemma  about  these  quantities. 

6)  Lemma 

6.1)  v(h)  and  c(h)  are  continuous  in  h>0. 

6.2)  For  every  N sufficiently  large,  there  is  an  h^^  such  that 


c(h„)‘ 


= N. 


6.3)  If  h^j  is  chosen  by  (6.2),  then 
v(h|^)  2 

+ cChy)  ->  0 as  N->  OC7  . 

N ^ 


Proof 

The  first  statement  follows  from  condition  (3.5)  and  the 

compactness  of  the  interval  of  integration. 

Since  v(h)-><»  and  c(h)->0  as  h 0,  it  is  clear  that 
2 

v(h)/c(h)  ->  oo  and  is  a continuous  function.  Hence  (6.2)  follows. 
With  hj^  chosen  by  (6.2),  ^ 

+ c(h^)^  = 2c(h^)^-»0  ash^^O. 

H 

□ 
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Now  we  are  ready  to  state  the  main  theorem  of  this  section. 
Although  the  proof  follows  the  spirit  of  Rosenblatt C 1 3 . the  result 
is  original  for  Fourier  series  estimators.  Before  now,  all  error 
estimates  for  series  estimators  were  of  the  integral  type 

r E [ f(x)  - f(x)l  ^ dx.  The  following  result  gives  estimates 

Jo 

of  local  type  E [ ^(^0^  ' important  step  in 

the  later  construction  of  the  modified  estimator  which  adapts  to  the 
local  properties  of  f. 

o 

To  aid  in  the  proof  we  introduce  f,  the  periodic  extension  of 
f,  defined  by 

?(x  + k)  = f(x) 

where  x£  TO*!]  k is  an  integer. 

8)  Theorem 
Suppose 

*5 

8.1)  f e w""  and  vanishes  in  a neighborhood  of  the 

end  points j 

8.2)  f is  defined  for  xe  C0,l]  and  h?'0  by 

CO  A 

f(x)  = ^ ^u^xp  (ZTTikx) 

k=-  oo 

-A  , N 

b,,  = Tr  exp  (-ZrfikX.)  r 

" j=i  . J » 

8.3)  The  sequence  [W|^{')5  |^_  ^ satisfies  conditions  (3). 


Then  for  Xq  £ C 0,13  , 


8.4)  lim  E [f{Xf,)]  - f(x.) 


= f (Xq)  . 


If,  furthermore,  we  choose  h=hj^  as  a function  of  N in  such  a way  that 
hjj  ->  0 as  N , then 


Tim  N varC‘f(Xf.)J 
N-»-A» 

v(h„) 


= f(Xp) 


Proof 


We  can  write 


f(x)  = N “^h^^  ■ ^j) 
<3  ^ 


where  K.(-)  is  the  kernel  associated  withfw.(*)^ 

" ^ ^ k=-  oo  . 

By  independence  of  the  samples. 


E [ ■^(Xq)!  = E [k^(Xq  - 1)3 


Kh(xo  ~ y)f(y)dy 


K^(y)  f (xq  + y)dy 


where  f is  the  periodic  extension  of  f.  Since  f vanishes  in  a 
neighborhood  of  the  end  points  of  [O.ll  , f also  has  three 
continuous  derivatives.  Hence  we  can  invoke  Taylor's  theorem  with 
remainder  and  expand 


E [ f(xQ)]  = j ^ H^'(XQ)y^ 


+ '(z(y))J  dy 


where  XQ<2(y)<y  or  y<'z(y)<  Xq. 
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By  conditions  (3.3)  and  (3.4),  this  reduces  to 


E [f(xQ)J  = f(xQ)  + f"(xQ)c(h)  + 


[> 


-Js 


Kh(y)  f'"(z(y))dy. 


Now 


f"(Xo)  - 


E [^(Xg)]  - f(Xg) 

! - 

‘ h 3 

Kh(y)  fr  f "(z(y))dy 
"is 

c(h) 

c(h) 

*5 


^ 1 sup.  If  ■■(x)| 
- 3!  X efo.l]'  ' 


-^5 


Kh(y)/yrdy 


c(h) 

and  this  — > 0 as  h->0  by  lemma  (4).  This  establishes  (8.4). 

Again  by  independence  of  the  samples, 
var  [f(xQ)]  = ^ var  [ K^(Xq  - 1)] 


Using  the  same  extensie:',  and  expansion,  we  have 
r 1 o 


K,^(Xg  - y)  f(y)dy  = 

0 1/ 


K.  (y)2[  f(xj  + f (xjy  + 


-is 


isf'(z(y))y^]dy 


= v(h)  f (Xg)  + 4 K^(y) Vf"(z(y))dy. 

-h 


Thus 


N var  [ f(xg)] 


v(h) 


£ 

.j,  K^(y)^y^i^’'(z(y))dy  - (£[f(Xn)J)' 


v(h) 


v^n;  I 


1 


17. 


A ^ 


Js  sup  (f"(x)| 

t 

Kj,(y)  y'^dy 



— xfCO.U 

v(h) 

1 v(h) 

Now  if  h = h|^— >0  as  N-^<3o,  then  these  two  terms  go  to  zero  by 
lemma  (4).  This  completes  the  proof. 

□ 


Thus  we  have  approximately  for  large  N, 
E[T(Xo)  - fCxp)]  ^ “N~ 


v(h^,)  + f"(xQ)  c(h|^)  . 


An  obvious  consequence  is  the  following: 
9)  Corol lary 


solve 


Under  the  hypothesis  of  theorem  (8),  suppose  we  choose  hj^  to 


v(h,,) 


A 


Then  f is  asymptotically  consistent  in  the  mean  square  sense  at  Xq. 
That  is, 

E [ f (x„)  - TCxq)"]  ^ 0 as  N 


Proof 


By  lemn:  (6),  v(h|,p 


Thus  asymptotically, 


+ c(h^)^  -^0. 


18. 

- f(x„)]2  < 

also  goes  to  zero. 
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III.  A Data-Adaptive  Estimator 


A.  Motivation 


Recall  the  simple  form  of  the  estimator  (I I. A. 4) 


f(x)  = g (x)  51  b.u.  (x) 
k=0 


with  the  integrated  variance 


b r ^ 

var  L f 


Cf(x)]  ^ 


and  integrated  bias  squared 


A 

(E r f(x)]  - f(x] 
J a g(x) 


k=n+l  b|^  . 


We  see  that  for  fixed  N and  increasing  n,  the  bias  decreases  but  the 
variance  increases.  For  samples  of  moderate  size  (say  N = 100),  we 
may  not  take  more  than  a few  terms  in  the  series  before  the  variance 
overwhelms  us.  Thus  we  must  hope  that  f may  be  well  approximated 
by  the  first  few  terms  in  the  expansion.  Ideally,  we  would  like  to 
choose  a family  7 f j,_g  for  which  this  occurs. 

It  is  impossible  to  select  a fixed  family  ^Uj^j  which 

works  well  for  all  functions  f.  So  let  us  consider  the  following 
adaptive  strategy.  From  the  sample  set  f > • • • will  extract 

certain  information  about  f.  We  use  this  information  to  fashion  a 
family  [u^,^  adapteo  to  f.  We  will  then  use  this  family  to 

obtain  an  estimate  of  f. 
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B.  Construction  of  the  Estimator 

Let  us  consider  a way  of  transforming  a given  orthogonal  family 
into  a new  orthogonal  family.  We  start  with  the  Fourier  functions 
•{exp  (Z^rilcx)^  1^-00  oi^tfionormal  on  [0,1]  . Suppose  that  we 
have  a transormation  G satisfying 

1.1)  G:  CO.l]  — ^ [O.O 

1.2)  G is  one-to-one,  onto,  strictly  increasing 

1.3)  g(x)  = G(x)  is  continuous. 

We  can  then  define 

2)  U|^(x)  = exp  (2;rik  G(x)) 

for  -oo  < k <cc. 

It  is  easily  seen  by  a change  of  variable  t = G(x) 


M pi 

u-(x)u.(x)*  g(x)  dx  = exp  (2;7’i(jG(x)  - kG(x)))n(x)Gx 

(J  0 Jo 


pi  r 

exp(277’i  (jt  - kt))dt  = O -i. 

Jo 


that  the  family  k=-co  orthonormal  with  respect  to  g on 

[ 0,1^  . This  immediately  yields  a series-type  estimator  considered 


earlier: 


OO 


3.1)  f(x)  = g(x)  ST  W|^(h)  b|^Uj^(x) 

k=-  00 


1 


N 


k ^ N 


3.2) 
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Thus  a transformation  G provides  us  with  a new  estimator. 

We  will  show  later  that  if  G(x)2;  f(y)dy  (that  is,  if  g~f), 

v)0 

then  the  new  family  provides  an  improved  estimate.  We 

k=-  CO 

cannot  choose  G a-priori , of  course,  since  knowledge  of  G is  equivalent 
to  knowledge  of  f.  However, 

we  can  estimate  G from  the  sample.  We  propose  the  following 
algorithm. 

4)  Adaptive  (or  Two-Pass)  Estimator 
Choose  h.j>0,  h2>0,  N.j , and  N2 

so  that  N-j  + N2  = N. 


4.1)  Let 

00 

g(x)  = X ^1,  exp(2xrikx) 

k=-  CO 

• T 

M 27  exp(-2  77ikl.) 


G(x) 


g(y)dy 


4.2)  f(x) 


00  ^ 

= 'g(x)  21  ^(^2)  exp  (2  77-ikG(x)) 
k=-  00 


A 

k 


N2 

^ exp(-2.;^ik  G(Xj)) 

^ j=NVl 


Remark 


The  choice  of  the  parameters  N-j,  N2  and  h^,  h2  is  not  specified 

above.  For  theoretical  analysis,  h-j , and  h2  will  be  chosen  as 

functions  of  N-j,  N2  (discussed  below  in  section  III.C).  In  practical 
application  of  the  estimator,  we  will  choose  < N2,  h^  > h2  so 

that  g(x)  is  a low-resolution  estimate  of  f and  'f  in  the  second 

pass  is  a high  resolution  estimate.  There  is  no  way  to  apply  theory 
in  practical  choice  of  the  parameters.  As  in  the  case  of  all  other 
p.d.f.  estimators,  we  must  resort  to  setting  the  values  by  heuristic 
means. 


III.C  Asymptotic  Error  Analysis 

We  will  now  develop  asymptotic  error  estimates  for  the  estimator 
(III.B,4).  The  development  will  be  in  two  steps.  First  we  will 
derive  estimates  based  on  the  assumption  that  g = g,  a deterministic 
function  satisfying  certain  inequalities.  Second,  we  will  determine 
bounds  on  the  probability  that  ^ satisfies  these  inequalities.  Thus 
the  final  estimates  will  hold  "in  probability." 

Let  G(*)  be  some  deterministic  function  satisfying  (III.B.l), 
and  let  9 be  defined  by  (III.B.3).  We  can  rewrite  the  expression 
(III. B. 3. 2)  for  b^  as  ^ 

A ^ 1 N 

1)  bb  = N 51  exp(-2;7ikT.) 

K N j^l  j 


where 
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2)  Tj-S(Xj). 

We  know  that  the  p.d.f.  of  the  transformed  random  variable  T=G(X) 
is  just  (see  [19]  ) r(-)  defined  by 

3)  r(t)  = r(G(x))  = f(x)/g(x). 


We  may  consider  r,  a simple  Fourier  series  estimator  for  r, 
defined  by 

oo 

4.1)  r(t)  = ^ W|^(h)  b,^  exp(2;7ikt) 

k=-oo 

A 1 N 

4.2)  b.  = 17  H exp(-2/7-ikT.)  . 

K N J 

Since  we  clearly  have 

5)  f(;:)  = g(x)  r(G(x)), 

it  follov/s  th?t 

6.1)  vc-.r[f(x)]=  g(x)^  var[r(G(x))] 

6.2)  bias  [f(x)]=  g(x)  bias[r(G(x)]  . 

Putting  this  together,  we  have  the  following 
7)  Theoi'c.n 

Suppose  f and  satisfies  the  hypothesis  of 

k ~ “ Oo 

theorem  (II. 3. 8).  Let  Gc  C^l0,1]  sati sfy  ( 1 1 1 .L . i ) and  f 
be  defined  by  (III. B. 3),  r by  (3),  and  ^ by  (4). 

Then  for  £ [.0,1]  such  that  g(xQ)  f 0, 
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lim  E[f(xQ)]  - f(xQ) 

h-»0  c(h)  " '^^0^ 


where  tg  = G(Xq), 


Further,  if  h„->0  as  N->«»  , then 


lin,  N var  [ f(«p)]  _ 

ki  V \</  u \ ~ U U 


N->oo  vau 


Proof 


Applying  theorem  (II. B. 8)  to  the  estimator  r we  have 


lim  E [r(tQ)]  - r(tQ) 

h-»0  c(h) 


r''(to) 


1 im  N var  r(tQ)^ 
N^oo  vThT) 


= rd^) 


By  (3)  and  (5)  we  have 


E [f(x.)]  - f(x„) 


= S(x)  


E[r(tg)]  - r(tg) 


lim  E[f(xQ)]  - f{Xp) 
h 0 c(h) 


= g(xQ)  r"(tQ) 


Also, 


25. 


Nvar[f(xQ)]  ^ ^2  Nvar[^(to)] 


Thus  lim  N varff(Xf.)7  « 

N->oo  v(0  " ^^^0^  " f(Xo^3(xQ)  . 


We  can  see  by  the  preceeding  theorem  that  the  quantity  r''{tg)  is 

A 

of  interest  in  the  asymptotic  error  of  f(xQ).  We  will  spend  some 
time  examining  r''  and  its  dependence  on  the  transformation  G. 

8)  Lemma 

Let  f,  g e C0,1]  be  p.d.f.'s. 

Define 

G(x)  = f g(y)dy 
JO 

and  for  xe  [O.l^  such  that  g(x)  > 0 
r(G(x))  = f(x)/g(x). 

Let  Xq  e (0,1)  with  g(xQ)  > 0,  and  t^  = G(Xq). 

Then 

^"(^0^  " fg(Xo)^f"(Xo)-g(xQ)f(xQ)g"(x  ) 

dt  I t=tQ  g(xQ)  L 

+ 3f(xQ)  [g'Mxg)]  ^ - 3g(xQ)  f’(xQ)  g'(xQ) 


4 

I 
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The  proof  of  this  lemma,  a straightforward  calculation,  is 
omitted.  We  now  establish  a bound  on  r''(tQ)  under  the  assumption 
that  g sf . 

9)  Lemma 

With  the  same  hypothesis  of  lemma  (8),  suppose  further  that 

we  have 

Ig^'^^Xp)  - f^'^^(xQ)|  < A “^1,  for  k=0,l,2. 

Let  B(f,XQ)  = max  { l,f(xQ),  |f'{xQ)l  , |f"(xQ)|}  . 

Then  at  t^  = G(Xg)  we  have 


r"(tn)  24  A B(f,xJ‘ 


9(Xj)- 


Proof 


For  conv j.ii ;.nc9 , we  will  write  f for  f(xg),  etc. 


We  have  by  le:'  '"-  (3), 

r"(to)  = \ - g g"f  + Sg'^f  - 3g  g'fj 


g^  T g fg^"  - ’g"]  + 3g’  [fg'  - gf 


'll- 


We  will  make  use  of  the  easily  verified  inequality 


I pq  - rs|  ^ Jj  lp-r|-|q+sl  + | P^-r|  • jq-s  j 

First, 

lgf-fg"|  ^ H (g-f||f+g“|  + jg+f]  jf-g" 

% A (2B+A)  + H (2B+A)  A ^ 3AB 
Second, 

I fg'-gf'l  ^ h ji'-gl  |f'+g'|  + h ( ^'+gl  \ f'-g'| 

^ Js  A (2B+A)  + H (2B+A)  A 3AB 
Moreover, 

g = f + g - f jg-fl  ^ B+A  ^ 2B 

g'  = f + g'  - f ^ B + A ^ 2B 

Thus 

|r"{to)|  ^ 2B  • 3AB  3 • 2B  • 3AB  ^ 

5 — 

g 

^ 24AB^ 
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r 


We  now  collect  what  we  have  so  far  into  a theorem  giving 
asymptotic  error  estimates  under  the  assumption  g^^f. 

10)  Theorem 
Suppose 

3 

10.1)  f 6 C C0,ll  and  vanishes  in  a neighborhood  of  the 
endpoints. 

10.2)  satisfies  conditions  (II. B. 3). 
k=-oo 

10.3)  Gee^[0,l]  satisfies  (III. B.l). 

Let  g(x)=  G(x),  f be  defined  by  (III.B.3), 

and  Xq  £■  (0,1)  such  that  f(Xg)  f 0. 

Choose  numbers  0 < p < 1 and  0 < A < pf(Xg). 

Suppose  moreover  that 

Ig^’^^Xg)  - f^’^^Xg)!  ^ A for  k-0,1,2. 

Then  we  have 

10.4)  Hm  E[f(xp)]  - f(X(,)  ABCf.Xj)^ 

where 

B(f,XQ)  = max{l,  | f (Xg)|  j-  (k=0,l,2). 

Furthermore,  if  hj^ 0 as  >co  , then 


29. 


10.5)  lim 
N-»oo 


N var[f(xQ)] 


■w 


-f(Xo)' 


^ Af(X(,) 


Proof 


By  theorem  (7)  we  have 


lim  E [f(xQ)]  - f(xQ) 

h-'*’  E(h5 


= g(xQ)r"(tQ) 


; I 


and 


lim  N var([f(xQ)j 
N->cx>  v(h|,p 


By  lemma  (9)  we  have 


f(xo)  g(xQ). 


AB‘ 


g(xn) 


5 * 


Thus 


lim  E f(xQ)  - f(xQ) 
h-^0  EuD  ’ 


24  AB‘ 

g(Xn)‘ 


Since  f(x^)  ^ fix^)  ^ flx^) 


T 


(Xq)-A  f(XQ)-pf(xQ)  ■ 


we  obtain 


1 im 
h-tO 


^ "'0^]  ~ 
c(h) 


24  AB^  1 


which  is  {1C.4). 

(lO.S;  follows  imnediately  since 
j f(Xg)g(X|,)  - f(xQ)^  j 6 A f(xp). 


□ 


r 


30. 


Now  let  us  return  to  the  adaptive  estimator  (III.B.4),  We 
know  that  g(xQ)  is  a consistent  estimator  for  fCx^),  by  theorem 
II.B.8  (with  proper  choice  of  h-j  = hj^^  ).  The  next  theorem  extends 
consistency  to  the  first  and  second  derivative.  First,  however,  we 
def i ne 

n.l)  For  k=0,l,2, 

C H 

Vk(h)  = (x)2  dx  , 

J -H 

where  r-  K.  (x). 

” dx*^  " 

Note  Vg(h)  = v(h) . 

11.2)  V(h)  t max  {vg(h),  v^(h),  V2(h)^  . 

12)  Theorem 

Let  “g  be  defined  by  (III.B.4). 

Suppose  that  the  kernel  associated  v/i  th 

{w^l’  ^ COjll  > 3nd  f £ [0,l]  vanishes  in  a 

neighborhood  of  the  endpoints. 

Define  for  x £ (0,1)  and  k = 0,1,2 

#* 

''(k)i  ■!  ^ d^  r 
dx 

Choose  h.|  = hj^  to  satisfy 


V(h„^) 


c(h„^)  . 


Then  for  X.  e (0,1),  E[g^''*(x.)  - ^ -i  0 


as  N-j  -^oo  . 


Proof 


Ni 

We  can  write  g(x)  =1  IT  K.  (x  - XJ. 

n j=l  "l  J 


Since  K.  £ C0,l3  , 


) = 1 (x  - X ) 


exists. 


Now  by  integration  by  parts,  we  get 
E[g^^^Xo)]  = (xQ-y)f(y)dy 

= -Kj,^(xo-y)f(y)  ^ - ^[-Xh/Vy)] 


K.  (xo-y)f^^^y)dy. 
0 "l  ^ 


A similar  result  holds  for  E[‘g^^^(xQ)J  . 


Thus  for  k = 0,1,2,  we  obtain  by  previous  methods 


E[g^‘^^(Xo)]  = 1 (Xo-y)f(X)(y)dy  - 

U L'  I 
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J -h  1 

+ f^'^'^^^(z(y))|y  ] dy 

= f^''^(xQ)  + f^''‘'^^(xQ)  c(h^)  + f ' S 4 ^^‘'''^^(z(y))dy. 

u -*5  1 3! 

Thus  we  have  an  estimate  for  the  bias 

|E[g''^’(X(|)]  - f<'')(Xj,)|  ^ c(h,) 

+ sup  I rc(h,)  a. 

x6(0.1)'  'J  ' I" 


since 


-J,  h 31^^^  - ^ 


For  the  variance  we  have 


var[5<'^>(x„)]  = 1 »ar[K<'|>(x„-E)] 

- k io  . 


Again  the  Taylor  expansion  with  remainder  yields 


iM; 


(xQ-y)^f(y)dy  = V|^(h^)f(x)  + ?s 


J J,  ' (^(y))dy. 
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So 

var[g<'^>(xo)]  ^ {vkOi,  )f(>'o>  + ( f"(x)|  (yfj^dy^ 


^ ''k<''l> 


Hence  by  the  indicated  choice  h,  = h.,  , 

I 


e[5‘''’(X|3)  - f<''’(x„)]  2 = |e[?(''>(xu)]  - f<''>(XQ)|  ' +V3r['5(x’(,)] 


T b — ^ 0 


is  N-j  — >00 


□ 


V/e  can  now  state  the  final  end  chief  result  on  the  asymptotic 
Gi'rcr  of  the  adaptive  estimator. 

13)  Theorem 
Suppose 

^ 5 

I 13.1)  f e C ro.ll  and  vanishes  in  a neighborhood  of  the 

k 
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Proof 

Recalling  the  notation  of  theorem  (10),  let  us  pick  A so  that 
0 A < ^(Xq)  ♦ 0 < A < , and 

24  A BCf.Xp)^ 
f(xo)'' 

r 

L- 


Then  by  theorem  (10),  if 


35. 


Thus  tiiere  is  some  such  that 

Thus,  fcr  this  , bounds  (13.7)  and  (3.8)  fail  to  hold  with 
pnrbab.lity  ^ ^ . 


r 
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Discussion 


We  now  consider  an  intuitive  interpretation  of  theorem  (13). 
For  this  purpose,  let  us  denote  by  the  simple  Fourier  series 
estimator  defined  in  (II.B.l)  and  by  the  adaptive  estimator 
(III.B.4). 

We  have  seen  from  theorem  (II. B. 8)  that  for  large  N,  that  the 
bias  jEff^(xQ)]  - f(xQ)|  ^ 

Theorem  (13)  gives  the  analogous  result 


The  factor  of  proportionality  6 can  be  made  as  small  as  desired, 
such  as  e«  |f ' ' (xq)!  , by  reserving  enough  samples  in  the 

first  pass.  Now  if  the  ratio 


c(hfj) 


clhj^  c(hf^_f^^) 


->•1  as  N -V  CO  , fixed. 


then  the  asymptotic  bias  of  f2(xQ)  is  smaller  than  that  of 

fl(Xo). 
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IV.  Computer  Simulations 

In  section  III  we  have  developed  an  asymptotic  error  analysis 
for  the  adaptive  estimator  which  describes  large-sample  behavior. 

The  asymptotic  approximations  made  are  not  valid  for  small  samples. 

Yet  it  is  the  case  of  small  samples  which  is  most  important  in  practice. 
Hence  we  must  turn  to  computer  simulations  to  demonstrate  the  behavior 
for  small  samples. 

In  the  following  simulations  we  consider  a mixture  of  two 
Gaussians 


1)  f(x)  = 0.78  f^(x)  + 0.22  f^(x) 
where  f.|  is  N(0,1)  and  f^  is  N(1.6,0.4). 

The  sample  set  consists  of  N = 100  independent  variates  drawn  from 
this  densi ty .generated  by  a standard  (polar  method)  pseudo-random 
number  generator. 

This  p.o."'.  v'cs  chosen  as  a test  case  because  it  has  two  closely 
spaced  modes  sc'arated  by  a shallow  valley  (see  figure  IV. 1 ).  The 
adaptive  estimator  promises  reduced  bias,  and  hence  it  should  be  able 
to  resolve  the  modes  better  than  the  conventional  Fourier  series 
estimator. 


In  the  t" 
partitioned  the 

and  the  second 
greatly  simplif 
sample-set  numc- 


■••oretical  (asymptotic)  analysis  in  section  III,  we 
sample  set into  two  parts  J , 

J . The  first  part  was  used  in  the  first  pass, 
part  was  used  in  the  second  pass.  The  partitioning 
fed  the  theoretical  analysis.  However,  in  small- 
rical  trials,  it  was  found  that  perfcrisance  of  the 


estimator  impreved  if  the  enti re  sample  was  used  in  both  passes. 


38. 


The  numerical  trials  reported  below  were  thus  conducted. 

Specifically,  for  a sample  set  (N=100),  the 

estimator  was  implemented  as  follows: 


^ ^ 20 

2.1)  g(x)  = ^ (1-h)  cos  2;rkx 

k=0  ^ 


cos  2;7klj  (k^l) 


2.3)  G(x)  = J*  g(y)dy 

2.4)  ?^{x)  ^ ^(x)  £ b,  cos(27rkG(x)) 

k=0 


2-5)  b^  = ^ Z cos(2/rkG(I,))  (k^l) 
A j = l ^ 


(The  expansions  employ  only  cosines  in  order  to  simplify  the 
computer  program.) 

•3  adaptive  estimator  will  be  compared  to  the  simple 
Kron  nal -Tarter  type  defined  by 

n 

= Z_  c.  cos  2/7’kx 
k=0 


2 v - 

in-  Z COS  277kl. 

j = l J 


2.;)  fi(x) 

3.2)  ^ 


1 


(k  ^1) 
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To  make  this  comparison  more  direct,  in  (2.4)  we  have  chosen  a 


weight  sequence  corresponding  to  simple  truncation.  (The  truncation 
point  5 was  chosen  by  trial  and  error.)  Note  that  for  h=l , the 
estimator  f'g  is  identical  to  for  n=5.  Below  we  will  observe  the 
effect  of  varying  h and  n. 

The  results  of  the  trials  will  be  presented  in  two  ways.  First, 
we  will  examine  the  estimates  obtained  from  one  fixed  sample  set 
as  h varies  for  f'g  and  n varies  for  . These  estimates  are  shown  in 
graphical  form  in  figures  IV. 2 through  IV. 7.  Second,  the  integrated 
square  error 

r (^(x)  - f(x))^dx 


will  be  computed  for  25  sample  sets,  and  statistically  reliable 
conclusions  will  be  drawn. 

A 

Figure  2 shows  the  result  for  f2  and  h=l.  This  is  the  trivial 
case,  since  for  this  choice  of  h,  g(x)  =1;  it  is  identical  to  a 

A 

simple  Fourier  series  estinate.  .’.'ote  that  the  estimate  f2  does  not 
resolve  the  two  modes  of  f.  Also  we  see  a substantial  negative  tail 
at  the  right  of  the  graph.  The  negativity  is  a result  of  truncating 
rather  than  tapering  the  series  terms  in  (2.4). 

Figure  3 shows  the  results  fcr  h = 0.4.  Now  ^ begins  to 

A 

concentrate  mass  near  the  modes  of  f.  We  see  that  f2  begins  to 
resolve  the  modes  and  that  the  negative  tail  is  somewhat  reduced. 

A 

In  figure  4,  h equals  0.25.  flew  f2  does  a very  good  job  of 
resolving  the  modes,  and  the  negative  tail  is  almost  eliminated. 


Clearly,  figure  4 is  a much  better  estimate  than  figure  2.  5y 
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allowing  the  estimator  to  adapt  (as  h varies)  we  have  greatly 
reduced  the  bias. 

One  may  wonder  how  well  the  simple  Fourier  estimator  (3) 
would  perform  if  we  vary  n.  The  case  of  n=5  is  shown  in  figure  5. 
(This  is  in  fact  the  same  estimate  as  in  figure  1.)  Now  as  we 
increase  to  n=7  (figure  6)  and  to  n=10  (figure  7),  the  performance 
is  improved.  However,  even  in  the  best  case  (n=10),  the  simple 
Fourier  series  estimator  is  inferior  to  the  adaptive  estimator.  Note 
in  particular  that  the  simple  estimator  is  able  to  resolve  the 
modes  in  figure  7 only  at  the  expense  of  introducing  spurious  modes 
(and  negative  values)  in  the  tails.  This  behavior  is  characteristic, 
since  the  simple  series  estimator  provides  a constant  amount  of 
resolution  over  the  entire  interval  C^.bJ  . The  adaptive  estimator, 
on  the  other  hand,  tunes  its  resolution  to  the  data;  it  provides 
higher  resoluiicn  where  the  density  of  the  data  is  higher. 

Next,  we  examine  some  Monte  Carlo  estimates  of  the  integrated 

■A  A 

mean  square  error  of  f.|  and  f^.  Twenty  five  sample  sets,  each  set 
consisting  of  ore  hundred  variates,  were  independently  generated. 

For  the  sainple  set  (i=l . ,25) , estimates  f,  . and  “fl  . were 

I f 1 ^ f 1 

obtained.  For  .ach  estimate,  the  integrated  square  error 

4)  (k=l,2;  i = l,...25) 

was  computed  bj,'  numerical  integration.  These  errors  are  tabulated  in 
table  IV. 1. 

Column  A is  the  result  for  the  adaptive  estimator  f'2  with 


h=0.25.  The  average  is  0.0078  with  standard  deviation  0.0043. 
Compare  this  with  column  B,  the  result  for  the  simple  Fourier  series 

A « 

estimator  f-j  with  n=5.  For  the  latter,  e^  = 0.0099  with  standard 
deviation  0.0028. 

For  these  trials,  the  average  integrated  squared  error  for  ‘9^2 
is  substantially  less  than  that  for  f^.  Since  n=5,  the  only  difference 
between  the  two  estimators  is  the  preprocessing  step  (2.1  - 2.3). 

This  clearly  shows  the  improvement  obtained  by  the  prior  transformation 


We  would  like  to  test  the  difference  in  the  averages  of  e-j  and 
62  for  statistical  significance.  Since  the  random  variables  ^ have 
no  readily  identifiable  distribution,  we  will  employ  a distribution- 
free  sign  test  for  the  median  difference  (see  [22^  ).  Consider 
the  null  hypothesis 

H:  median  (ei-e2)  — 0 
against  the  alternative 

A:  median  (ei-e2)  > 0. 

Clearly  if  H is  true  then  62  > is  as  likely  as  62  < and 
f2  is  no  better  than  f ^ . If  A is  true,  however,  then 
likely. 

Comparing  columns  A and  B,  we  find  e, . < . occurs  22  times, 

with  the  reverse  occuring  three  times.  Referring  tc  the  one-tailed 
cumulative  binomial  distribution  we  see  that  H may  be  rejected  with 
significance  0.001. 

Next  we  compare  f2  to  fj^  for  n=10  (column  C).  Here  again  the 
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average  ^2^  ®T  However,  the  sign  test  is  not  significant  for  25 
trials.  Therefore,  another  25  trials. were  run  and  the  results  are 
tabulated  in  table  IV. 2.  Applying  the  sign  test  for  the  50  trials 
yields  34  occurrences  of  621“^  ®li  occurrences  of  e2.j  ^ 

Thus  we  may  reject  H with  significance  0.01. 

Column  0 tabulates  the  results  of  25  trials  for  f2  with  n=7. 

Note  that  e.j  = 0.0076,  which  is  not  significantly  different  from 

- /s 

62.  Thus,  in  mean-square  error  alone,  f2  is  not  better  than  f.| 

for  n=7.  However,  by  another  performance  measure,  f2  is  substantially 

better.  One  important  task  of  a p.d.f.  estimator  is  to  resolve  and 

estimate  the  location  of  the  modes  of  the  p.d.f.  Thus,  let  us 

define  another  error  measure  m equal  to  the  sum  of  the  squared 

distances  from  the  true  modes  (located  at  x=0  and  x=1.6)  to  the 

nearest  modes  of  the  estimate.  Thus  if  9 has  modes  at  x=-0.2  and 

1.4,  then  m=(-0.2-0)^  + (1.4-1. 6)^  = 0,08;  if  is  unimodal  with 

2 2 

mode  at,  say,  x=1.0,  then  m=(l-0)  + (1-1.6)  = 1.36,  Errors 

A ^ , 

m2^-  for  f2  and  m^^.  for  f^  (n=7)  are  tabulated  in  table  IV. 3 for 
the  25  trials.  The  average  m2  = 0.31  which  is  substantially  less 

/N 

than  m.|  = 1.04.  Note  that  f.j  failed  to  resolve  the  modes  (that  is, 

A A 

f^  was  unimodal)  in  12  of  the  25  trials;  f2  failed  to  resolve  in  only 

A /S 

2 trials.  Thus,  although  f.|  with  n=7  performs  as  well  as  f2  in  the 
"average"  measure  of  integrated  s^quare  error,  'f2  provides  greatly 
enhanced  resolution  (that  is,  lower  bias).  Applying  the  median 
difference  sign  test  tc  table  IV. 3 yields  a significance  of  0.02. 
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•- 

! 

I 


V.  Summary  and  Conclusions 

We  have  looked  in  detail  at  the  orthogonal -series  type  of 

estimator  and  at  its  assymptotic  error  analysis.  The  main  contribution 

of  this  paper  is  the  proposal  of  a new  estimator.  This  estimator  is 

constructed  by  means  of  a prior  data-dependent  transformation  of  the 

basis  in  order  to  reduce  the  bias  of  the  estimate.  We  have  developed 

\ 

an  assymptotic  error  analysis  of  the  adaptive  estimator;  and  to 
demonstrate  the  small-sample  behavior  of  the  estimator,  we  have  consi- 
dered some  computer  implementations. 

As  we  see  from  both  the  error  analysis  and  the  computer  simu- 
lations, there  is  an  advantage  to  be  gained  from  performing  the  data- 
dependent  transformation.  Resolution  is  improved  (bias  is  reduced) 
in  comparison  to  the  conventional  Fourier-series  estimator.  This 
improvement  could  be  of  significance  in  pattern-recognition  applications. 
As  shown  in  the  computer  simulations,  the  adaptive  estimator  was  able 
to  resolve  closely-spaced  modes  without  introducing  spuriojs  modes  in 
the  tails  of  the  densities.  In  pattern  recognition  we  are  interested 
in  ratios  of  probability  density  functions.  The  ability  to  detect  the 
fine  structure  of  densities  from  a limited  set  of  samples  can  lead 
to  improved  discriminant  functions  (and  hence  a lower  rate  of  mis- 
classification) . 


ACKNOWLEDGEMENT 


44. 


It  is  a pleasure  to  acknowledge  the  many  helpful  discussions 
with  Professor  James  R.  Thompson  on  the  topic  of  this  research;  we 
greatly  appreciated  his  encouragement  and  constructive  criticism. 

We  would  also  like  to  acknowledge  with  thanks  the  financial  support 
of  the  NSF  grant  ENG  74-17955  and  the  AFOSR  grant  75-2777. 


53. 


TABLE  IV. 2 

"Integrated  Squared  Error  (continued)" 


Standard 

Deviation 
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TABLE  IV. 3 


"Error  in  Location  of  Modes" 


Trial 

m2^.  for 

(h  = 0.25) 

m.|.  for 

(n  = 7) 

1 

.06 

.11 

2 

.08 

2.85* 

3 

.39 

4.23* 

4 

.42 

.34 

5 

.32 

1.24* 

6 

.22 

.26 

7 

.16 

1.31* 

8 

.39 

1.16* 

9 

.03 

.13 

10 

.01 

.12 

11 

.03 

1.70* 

12 

.03 

.19 

13 

1.54* 

1 .41* 

14 

.03 

.12 

15 

.32 

.16 

16 

.26 

.26 

17 

.33 

2.32* 

18 

.62 

.58 

19 

.08 

2.57* 

20 

.34 

.31 

21 

.26 

.34 

22 

.05 

.16 

23 

.32 

1.54* 

24 

1.41* 

1.18* 

25 

.01 

1.48* 

Mean 

0.31 

1 . 04 

* Estimate  was  unimodal 
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